Exploring Medalla data

Initial exploration.

Barnabé Monnot https://twitter.com/barnabemonnot (Robust Incentives Group, Ethereum Foundation)https://github.com/ethereum/rig
2020-10-28

Table of Contents


In this notebook we explore data from the Medalla testnet. We are looking at the 388001 first slots.

Data sources

Lighthouse block export

We use a fork of Lakshman Sankar’s Lighthouse block exporter to export attestations and blocks from the finalised chain until slot 388000.

We present the main datasets below:

all_ats

Each row in this dataset corresponds to an aggregate attestation included in a block.

exploded_ats

We cast the dataset above into a long format, such that each row corresponds to an individual attestation included in a block. Note that when this individual attestation is included multiple times over multiple aggregates, it appears multiple times in the dataset.

individual_ats

exploded_ats is the “disaggregated” version of the aggregate attestations. To check for validator performance, we often don’t need to check for every inclusion of their individual attestations. individual_ats contains these unique, individual attestations, tagged with some extra data such as their earliest inclusion and whether they attested correctly for the target checkpoint and the head.

Weald dump

Jim McDonald, from Attestant, kindly provided a treasure trove of data on the #medalla-data-challenge channel of the EthStaker Discord server. The two previous datasets could have legitimately been mined from Jim’s data, but we like to get our hands dirty.

all_cms

Not too dirty though: obtaining the past record of committees (which validators are supposed to attend when) is much more computationally intensive, since it requires access to past states. Yet given that we have 12125 epochs in our dataset and a maximum of 73258 validators, a dataset compiling all committee assignments would have 888253250 rows, which is too much. When we need committee information, we’ll pull it from the database and record intermediary datasets instead.

val_balances

This dataset gives us validator state balances at the beginning of each epoch. Note that the state balance (balance), the true ETH amount a validator deposited, is different from the effective balance (effective_balance), which measures the principal on which validators receive an interest.

Computed datasets

To ease the computational demands of this notebook, we record two datasets from which much of the analysis can be derived.

stats_per_val

For each validator, we compute a bunch of statistics, including:

stats_per_slot

We also record summary statistics for each slot. At 388000 slots in our dataset, this remains manageable to query. We have the following fields:

Performance of duties

Attester duties

We compare the number of included attestations with the number of expected attestations.

Clearly something went very wrong circa epoch 2,500. This is now known as the roughtime incident, an issue affecting the major validator client, Prysm. It took time for the network to recover, in the process demonstrating how the quadratic inactivity leak mechanism works. Client diversity FTW!

Proposer duties

How many blocks are there in the canonical chain?

Again, the same trough during the roughtime incident.

Correctness of attestations

Target checkpoint

Attestations vouch for some target checkpoint to justify. We can check whether they vouched for the correct one by comparing their target_block_root with the latest known block root as of the start of the attestation epoch (that’s a mouthful). How many individual attestations correctly attest for the target?

How does the correctness evolve over time?

Head of the chain

Attestations must also vote for the correct head of the chain, as returned by the [GHOST fork choice rule]. To check for correctness, one looks at the latest block known as of the attestation slot. Possibly, this block was proposed for the same slot as the attestation att_slot. When the beacon_block_root attribute of the attestation and the latest block root match, the head is correct!

How does the correctness evolve over time?

Validator performance

Validators are rewarded for their performance, and penalised for failing to complete their tasks. We start with a crude measure of performance: the number of included attestations. It is a crude measure since (a) we do not discount the timeliness of the validator, measured by the inclusion delay and (b) we do not check that the attestation’s attributes are correct (with the exception of the source attribute, since an incorrect source cannot possibly be included on-chain).

Uptime-rewards curve I: Included attestations

We compare the percentage of included attestations with the (possibly negative) reward obtained by the validator.

Who are the validators getting a negative return? We plot the same, showing how long a validator has been in service.

Recently activated validators have a much more balanced uptime-reward curve, with the higher performers getting positive returns. Meanwhile, validators who were active since the beginning tend to have smaller returns. This can be due to validator fatigue (validating for a while, then turning off the rig), but a fair number of early validators have high attestation performance yet low return. The roughtime incident is likely to blame here. Let’s focus on these early validators.

Inactivity leaks push the uptime-rewards curve downwards. At best, validators can preserve their current balance if they validate optimally, with inclusion delay at 1 always. Most likely, active validators lose a small amount of their balance due to delay or attestation errors, while inactive validators leak much more.

Uptime-rewards curve II: Inclusion delay

We turn our attention to the inclusion delay. Validators are rewarded for attesting timely, with higher rewards the earlier they are included in a block. We explode aggregates contained in the blocks to trace the earliest included attestation of each validator in an epoch.

Note that the y axis is given on a logarithmic scale. A high number of attestations have a low inclusion delay, which is good! Since attestations cannot be included more than 32 slots from their attesting slot, the distribution above is naturally capped at 32.

How is the inclusion delay correlated with the rewards? We look at validators with at least 70% of included attestations and activated after the roughtime incident to find out.

The plot looks rather homogeneous…

Aggregate attestations

eth2 is built to scale to tens of thousands of validators. This introduces overhead from message passing (and inclusion) when these validators are asked to vote on the canonical chain. To alleviate the beacon chain, votes (a.k.a. individual attestations) can be aggregated.

In particular, an attestation contains five attributes:

Since we expect validators to broadly agree in times of low latency, we also expect that a lot of individual attestations will share these same five attributes. We can aggregate such a set of individual attestations \(I\) into a single, aggregate, attestation.

When we have \(N\) active validators, about \(N / 32\) are selected to attest for each slot in an epoch. The validators for a slot \(s\) are further divided between a few committees. Identical votes from validators in the same committee can be aggregated. Assume that two aggregate attestations were formed from individual attestations of validators in set \(C(s, c)\), validators in committee \(c\) attesting for slot \(s\). One aggregate contains individual attestations from set \(I \subseteq C(s, c)\) and the other individual attestations from set \(J \subseteq C(s, c)\). We have two cases:

How many individual attestations are contained in aggregates?

A fairly high number of aggregate attestations included in a block are actually individual attestations (the very tall bar on the left side of the plot, where number of individual attestations per aggregate is equal to 1). Nonetheless, a significant number of aggregates tally up between 50 and 100 individual attestations.

We can plot the same, weighing by the size of the validator set in the aggregate, to count how many individual attestations each size of aggregates included.

Overall, we can plot the Lorenz curve of aggregate attestations. This allows us to find out the share of attestations held by the 20% largest aggregates.

The answer is 47%.

How much savings did aggregates provide?

We compare how many individual attestations exist to how many aggregates were included in blocks.

We have 23.82 times more individual attestations than aggregates, meaning that if we were not aggregating, we would have 23.82 as much data on-chain.

In how many aggregate attestations is a single attestation included?

We look at all individual attestations in our dataset, i.e., individual, unaggregated votes, and measure how many times they were included in an aggregate.

Most attestations were included in an aggregate once only

How many redundant aggregate attestations are there?

We call myopic redundant identical aggregate attestations (same five attributes and same set of validator indices) which are included in more than one block. It can happen when a block producer does not see that an aggregate was previously included (e.g., because of latency), or simply when the block producer doesn’t pay attention and greedily adds as many aggregates as they know about.

The mode is 1, which is also the optimal case. A redundant aggregate does not have much purpose apart from bloating the chain.

We could generalise this definition and call redundant an aggregate included in a block for which all of its attesting indices were previously seen in other aggregates. We didn’t compute these as they are much harder to count.

How many times did a block include the exact same aggregate attestation more than once?

We could call these strongly redundant, as this is pure waste.

We see that 229 times, identical aggregates were included twice in a block.

How many aggregates in a block are included by another aggregate in the same block?

We now define subset aggregates. Suppose two aggregates in the same block with equal attributes (slot, committee index, beacon root, source root and target root) include validator sets \(I\) and \(J\) respectively. If we have \(I \subset J\), i.e., if all validators of the first aggregate are also included in the second aggregate (but the reverse is not true), then we call the first aggregate a subset aggregate of the second.

Subset aggregates, much like redundant aggregate attestations (equal aggregates included in more than one block of the canonical chain), can be removed from the finalised chain without losing any voting information. In fact, subset aggregates use much less local information than redundant aggregates. To root out subset aggregates, a client simply must ensure that no aggregate it is prepared to include in a block is a subset aggregate of another. Meanwhile, to root out redundant aggregates, a client must check all past blocks (until the inclusion limit of 32 slots) to ensure that it is not including a redundant aggregate. In a sense, subset aggregate are “worse” as they should be easier to root out.

So among all included aggregates in blocks, how many are subset aggregates? We count these instances for attestations included in blocks until slot 30000.

We find that 3.02% included aggregates are subset aggregates.

How often are subset aggregates of size 1?

Taking a look at instances of subset aggregates, we often observe that the subset aggregate has size 1. In other words, it is often the case that a “big” aggregate is included, aggregating very many validators, and then a second aggregate of size 1, namely, an individual attestation, is included too, while this second individual attestation is already aggregated by the first, larger aggregate.

A large majority of subset aggregates (96.29%) have size 1.

How many times were clashing attestations included in blocks?

We look at situations where two aggregate attestations are included in the same block, with identical attributes (same attesting slot, attesting committee, beacon chain head, source block and target block) but different attesting indices and neither one is a subset of the other. We define the following two notions, assuming the two aggregate attestations include attestations of validator sets \(I\) and \(J\) respectively:

Let’s first count how many aggregates are strongly clashing in blocks before slot 30000.

How many are weakly clashing?

Note that optimally aggregating a set of aggregates is NP-complete! Here is a reduction of the optimal aggregation problem to the graph colouring. Set aggregate attestations as vertices in a graph, with an edge drawn between two vertices if the validator sets of the two aggregates have a non-empty overlap. In the graph colouring, we look for the minimum number of colours necessary to assign a colour to each vertex such that two connected vertices do not have the same colour. All vertices who share the same colour have an empty overlap, and thus can be combined into an aggregate. The minimum number of colours necessary to colour the graph tells us how few aggregates were necessary to combine a given set of aggregates further.

Aggregates glossary

We’ve looked at aggregate attestations in a few different ways. We offer here a table to summarise the definitions we have introduced and associated statistics.

Name Explanation Statistics Recommendation
Aggregate Attestation summarising the vote of validators in a single committee There are 14760602 aggregates included from slot 0 to slot 388000 x
Individual attestation A single validator vote There are 351632707 individual attestations x
Savings ratio The ratio of individual attestations to aggregate attestations The savings ratio is 23.82 Keep it up!
Redundant aggregate An aggregate containing validator attestations which were all already included on-chain, possibly across several aggregates with different attesting indices x Don’t include these
Myopic redundant aggregate An aggregate included more than once on-chain, always with the same attesting indices There are 1317231 myopic redundant aggregates, 8.92% of all aggregates These are redundant too: don’t include them either

In the next table, we present definitions classifying aggregates when two or more instances are included in the same block with the same five attributes (attesting slot and committee, beacon root, source root and target root).

Name Explanation Statistics Recommendation
Strongly redundant aggregate An aggregate included more than once in the same block There are 229 strongly redundant aggregates Keep only one of the strongly redundant aggregates
Subset aggregate If not strongly redundant, an aggregate fully contained in another aggregate included in the same block There are 43666 subset aggregates until slot 30000, 3.02% of all aggregates until slot 30000 Drop all subset aggregates
Strongly clashing aggregates If not a subset aggregate, an aggregate with attesting indices \(I\) such that there exists another aggregate attesting for the same in the same block with attesting indices \(J\) and \(I \cap J \neq \emptyset\) There are 68639 strongly clashing aggregates until slot 30000, 4.75% of all aggregates until slot 30000 These cannot be aggregated further. Do nothing
Weakly clashing aggregates If not a strongly clashing aggregate, an aggregate with attesting indices \(I\) such that there exists another aggregate attesting for the same in the same block with attesting indices \(J\) There are 380059 weakly clashing aggregates until slot 30000, 26.3% of all aggregates until slot 30000 These can be aggregated further into one aggregate with attesting indices \(I \cup J\). In an ideal world, we have 0 weakly clashing aggregates

Size one aggregates appear often in the dataset.

Name Explanation Statistics Recommendation
Subset aggregate of size 1 A subset aggregate which is an unaggregated individual attestation There are 42044 subset aggregates of size 1 until slot 30000, 96.29% of all subset aggregates until slot 30000 Definitely drop these
Aggregate of size 1 An individual attestations included without being aggregated There are 1125630 aggregates of size 1 Either it is weakly clashing, so aggregate it further; or it is a subset aggregate, so drop it; or it is a redundant